Synchronize using `MTLSharedEvent`s #633

christiangnrd · 2025-07-25T04:33:59Z

By reusing the same shared event, reduces the number of allocations.

Hopefully slightly improves performance, but this is more in line with how I believe queue synchronization will have to work with Metal 4.

Edit: performance slightly worse.

github-actions · 2025-07-25T04:34:42Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.

diff --git a/lib/mtl/events.jl b/lib/mtl/events.jl
index 374f4285..8d2345ed 100644
--- a/lib/mtl/events.jl
+++ b/lib/mtl/events.jl
@@ -29,9 +29,11 @@ function MTLSharedEvent(dev::MTLDevice)
     return obj
 end
 
-function waitUntilSignaledValue(ev::MTLSharedEvent, value, timeoutMS=typemax(UInt64))
-    @objc [ev::id{MTLSharedEvent} waitUntilSignaledValue:value::UInt64
-                        timeoutMS:timeoutMS::UInt64]::Bool
+function waitUntilSignaledValue(ev::MTLSharedEvent, value, timeoutMS = typemax(UInt64))
+    return @objc [
+        ev::id{MTLSharedEvent} waitUntilSignaledValue:value::UInt64
+        timeoutMS:timeoutMS::UInt64
+    ]::Bool
 end
 
 ## shared event handle
diff --git a/src/state.jl b/src/state.jl
index 0782db68..918731c4 100644
--- a/src/state.jl
+++ b/src/state.jl
@@ -61,7 +61,7 @@ end
 Return the `MTLSharedEvent` used to synchronize a queue
 """
 function queue_event(queue::MTLCommandQueue)
-    get!(task_local_storage(), (:MTLSharedEvent, queue)) do
+    return get!(task_local_storage(), (:MTLSharedEvent, queue)) do
         MTLSharedEvent(queue.device)
     end::MTLSharedEvent
 end
@@ -82,7 +82,7 @@ First-In-First-Out manner, this synchronizes the GPU.
     cmdbuf = MTLCommandBuffer(queue)
     MTL.encode_signal!(cmdbuf, ev, val)
     commit!(cmdbuf)
-    MTL.waitUntilSignaledValue(ev,val)
+    MTL.waitUntilSignaledValue(ev, val)
     return
 end

github-actions

Metal Benchmarks

Benchmark suite	Current: `4d18869`	Previous: `422c43f`	Ratio
`latency/precompile`	`9811076333` ns	`9811445750` ns	`1.00`
`latency/ttfp`	`3976837999.5` ns	`3973553937` ns	`1.00`
`latency/import`	`1277901146` ns	`1273914917` ns	`1.00`
`integration/metaldevrt`	`820625` ns	`919250` ns	`0.89`
`integration/byval/slices=1`	`1530333` ns	`1620625` ns	`0.94`
`integration/byval/slices=3`	`8915625` ns	`9296666.5` ns	`0.96`
`integration/byval/reference`	`1523333` ns	`1601437.5` ns	`0.95`
`integration/byval/slices=2`	`2546958` ns	`2670791` ns	`0.95`
`kernel/indexing`	`593166.5` ns	`668834` ns	`0.89`
`kernel/indexing_checked`	`578084` ns	`663708` ns	`0.87`
`kernel/launch`	`8000` ns	`9042` ns	`0.88`
`array/construct`	`6000` ns	`6125` ns	`0.98`
`array/broadcast`	`566750` ns	`635500` ns	`0.89`
`array/random/randn/Float32`	`807937` ns	`815104` ns	`0.99`
`array/random/randn!/Float32`	`608417` ns	`635375` ns	`0.96`
`array/random/rand!/Int64`	`543917` ns	`572750` ns	`0.95`
`array/random/rand!/Float32`	`576000` ns	`597958` ns	`0.96`
`array/random/rand/Int64`	`777688` ns	`733583` ns	`1.06`
`array/random/rand/Float32`	`621020.5` ns	`620604` ns	`1.00`
`array/accumulate/Int64/1d`	`1285687.5` ns	`1328542` ns	`0.97`
`array/accumulate/Int64/dims=1`	`1813167` ns	`1915812.5` ns	`0.95`
`array/accumulate/Int64/dims=2`	`2151708` ns	`2252917` ns	`0.96`
`array/accumulate/Int64/dims=1L`	`11634083.5` ns	`11629709` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`9686812` ns	`9677895.5` ns	`1.00`
`array/accumulate/Float32/1d`	`1121125` ns	`1230791` ns	`0.91`
`array/accumulate/Float32/dims=1`	`1550417` ns	`1648666.5` ns	`0.94`
`array/accumulate/Float32/dims=2`	`1865291` ns	`1970042` ns	`0.95`
`array/accumulate/Float32/dims=1L`	`9850937` ns	`9975938` ns	`0.99`
`array/accumulate/Float32/dims=2L`	`7285729` ns	`7387104.5` ns	`0.99`
`array/reductions/reduce/Int64/1d`	`1432625` ns	`1582500` ns	`0.91`
`array/reductions/reduce/Int64/dims=1`	`1045959` ns	`1150791.5` ns	`0.91`
`array/reductions/reduce/Int64/dims=2`	`1170750` ns	`1302979` ns	`0.90`
`array/reductions/reduce/Int64/dims=1L`	`2074667` ns	`2153834` ns	`0.96`
`array/reductions/reduce/Int64/dims=2L`	`3416958` ns	`3552729` ns	`0.96`
`array/reductions/reduce/Float32/1d`	`988125` ns	`1047958` ns	`0.94`
`array/reductions/reduce/Float32/dims=1`	`797437.5` ns	`875334` ns	`0.91`
`array/reductions/reduce/Float32/dims=2`	`745167` ns	`801083` ns	`0.93`
`array/reductions/reduce/Float32/dims=1L`	`1733083` ns	`1800646` ns	`0.96`
`array/reductions/reduce/Float32/dims=2L`	`1754125` ns	`1880520.5` ns	`0.93`
`array/reductions/mapreduce/Int64/1d`	`1445437.5` ns	`1555709` ns	`0.93`
`array/reductions/mapreduce/Int64/dims=1`	`1057979.5` ns	`1139791` ns	`0.93`
`array/reductions/mapreduce/Int64/dims=2`	`1179167` ns	`1276833.5` ns	`0.92`
`array/reductions/mapreduce/Int64/dims=1L`	`2092312` ns	`2162875` ns	`0.97`
`array/reductions/mapreduce/Int64/dims=2L`	`3438542` ns	`3576375` ns	`0.96`
`array/reductions/mapreduce/Float32/1d`	`1003208` ns	`1077521` ns	`0.93`
`array/reductions/mapreduce/Float32/dims=1`	`783083` ns	`869417` ns	`0.90`
`array/reductions/mapreduce/Float32/dims=2`	`747333.5` ns	`796458` ns	`0.94`
`array/reductions/mapreduce/Float32/dims=1L`	`1731000` ns	`1791958` ns	`0.97`
`array/reductions/mapreduce/Float32/dims=2L`	`1767250` ns	`1892687.5` ns	`0.93`
`array/private/copyto!/gpu_to_gpu`	`629667` ns	`654917` ns	`0.96`
`array/private/copyto!/cpu_to_gpu`	`798334` ns	`822333` ns	`0.97`
`array/private/copyto!/gpu_to_cpu`	`777667` ns	`815083` ns	`0.95`
`array/private/iteration/findall/int`	`1641458` ns	`1662667` ns	`0.99`
`array/private/iteration/findall/bool`	`1443125` ns	`1504500` ns	`0.96`
`array/private/iteration/findfirst/int`	`1817375` ns	`1894188` ns	`0.96`
`array/private/iteration/findfirst/bool`	`1792250` ns	`1818166` ns	`0.99`
`array/private/iteration/scalar`	`3896583.5` ns	`4215375` ns	`0.92`
`array/private/iteration/logical`	`2543354` ns	`2643833` ns	`0.96`
`array/private/iteration/findmin/1d`	`1844125` ns	`1959625` ns	`0.94`
`array/private/iteration/findmin/2d`	`1411020.5` ns	`1525062.5` ns	`0.93`
`array/private/copy`	`550250` ns	`563000` ns	`0.98`
`array/shared/copyto!/gpu_to_gpu`	`85792` ns	`83167` ns	`1.03`
`array/shared/copyto!/cpu_to_gpu`	`86791` ns	`83500` ns	`1.04`
`array/shared/copyto!/gpu_to_cpu`	`82958` ns	`82792` ns	`1.00`
`array/shared/iteration/findall/int`	`1647291` ns	`1691958` ns	`0.97`
`array/shared/iteration/findall/bool`	`1467375` ns	`1524959` ns	`0.96`
`array/shared/iteration/findfirst/int`	`1368521` ns	`1420354.5` ns	`0.96`
`array/shared/iteration/findfirst/bool`	`1362166.5` ns	`1403958` ns	`0.97`
`array/shared/iteration/scalar`	`209375` ns	`158834` ns	`1.32`
`array/shared/iteration/logical`	`2354042` ns	`2536750` ns	`0.93`
`array/shared/iteration/findmin/1d`	`1400750` ns	`1418937.5` ns	`0.99`
`array/shared/iteration/findmin/2d`	`1423666` ns	`1545646` ns	`0.92`
`array/shared/copy`	`251708` ns	`246937.5` ns	`1.02`
`array/permutedims/4d`	`2434292` ns	`2531354` ns	`0.96`
`array/permutedims/2d`	`1169312.5` ns	`1248000` ns	`0.94`
`array/permutedims/3d`	`1723729.5` ns	`1810958` ns	`0.95`
`metal/synchronization/stream`	`19000` ns	`14875` ns	`1.28`
`metal/synchronization/context`	`20292` ns	`15584` ns	`1.30`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt

LGTM.

Is there a Metal3->4 migration document anywhere indicating we should switch from MTLCommandBuffer::waitUntilCompleted to a shared event?

christiangnrd · 2025-07-29T21:39:46Z

Is there a Metal3->4 migration document anywhere indicating we should switch from MTLCommandBuffer::waitUntilCompleted to a shared event?

Nope. I couldn't find anything from Apple's documentation that mentioned full command queue synchronization. Seems like they're more focused on more granular synchronization with the new API.

The benchmarks seem to indicate that this is slower for Metal 3, so maybe we leave it as-is for metal 3 and only transition to event-based synchronization for Metal 4 command queues. What made me believe that this is how they expect us to do it with Metal 4 is that MTL4CommandQueue has a new signalEvent method which allows to signal without creating a command buffer.

I'm closing this PR to avoid polluting the PR list, and I linked to this PR in #612 for when I (or someone else) get around to finishing metal 4 support.

We can always reopen if someone thinks we should use MTLSharedEvents for Metal 3 synchronization. Locally and using the crude @time Metal.synchronize() in the repl, shared events was giving me similar performance with more consistent timings than with waitForCompleted, but these benchmarks show a regression so I'll trust the benchmarks for now

maleadt · 2025-07-30T06:57:00Z

these benchmarks show a regression so I'll trust the benchmarks for now

Synchronization itself seems slower, but all other operations seem slightly faster. Or is this just the noise of the benchmarks? It's awfully consistent. And I could imagine the microbenchmark being slower while real-use application being beneficial.

christiangnrd · 2025-08-01T01:41:59Z

all other operations seem slightly faster

Good point. I'll rerun and if results are similar I'll merge

codecov · 2025-08-01T12:27:30Z

Codecov Report

❌ Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 80.96%. Comparing base (422c43f) to head (4d18869).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/state.jl	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #633      +/-   ##
==========================================
- Coverage   80.97%   80.96%   -0.01%     
==========================================
  Files          61       61              
  Lines        2712     2722      +10     
==========================================
+ Hits         2196     2204       +8     
- Misses        516      518       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 1942968.

This reverts commit 9b8c604.

christiangnrd changed the title ~~Synchronize using MTLSharedEvents~~ Synchronize using MTLSharedEvents Jul 25, 2025

github-actions bot reviewed Jul 25, 2025

View reviewed changes

christiangnrd marked this pull request as draft July 26, 2025 20:00

maleadt approved these changes Jul 29, 2025

View reviewed changes

christiangnrd mentioned this pull request Jul 29, 2025

Metal 4 #612

Draft

3 tasks

christiangnrd closed this Jul 29, 2025

christiangnrd reopened this Aug 1, 2025

christiangnrd marked this pull request as ready for review August 1, 2025 07:31

Synchronize with MTLSharedEvents

4d18869

christiangnrd force-pushed the events branch from efc3bc3 to 4d18869 Compare August 1, 2025 07:31

christiangnrd merged commit 1942968 into main Aug 1, 2025
7 checks passed

christiangnrd deleted the events branch August 1, 2025 12:38

christiangnrd added a commit that referenced this pull request Aug 1, 2025

Revert "Synchronize via MTLSharedEvents (#633)"

a52e795

This reverts commit 1942968.

christiangnrd mentioned this pull request Aug 1, 2025

Revert "Synchronize using MTLSharedEvents" #645

Merged

christiangnrd added a commit that referenced this pull request Aug 2, 2025

Revert "Synchronize via MTLSharedEvents (#633)" (#645)

9b8c604

christiangnrd added a commit that referenced this pull request Aug 2, 2025

Revert "Revert "Synchronize via MTLSharedEvents (#633)" (#645)"

12e627c

This reverts commit 9b8c604.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synchronize using `MTLSharedEvent`s #633

Synchronize using `MTLSharedEvent`s #633

Uh oh!

christiangnrd commented Jul 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

maleadt left a comment

Uh oh!

christiangnrd commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

christiangnrd commented Aug 1, 2025

Uh oh!

codecov bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Synchronize using MTLSharedEvents #633

Synchronize using MTLSharedEvents #633

Uh oh!

Conversation

christiangnrd commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

maleadt left a comment

Choose a reason for hiding this comment

Uh oh!

christiangnrd commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

christiangnrd commented Aug 1, 2025

Uh oh!

codecov bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Synchronize using `MTLSharedEvent`s #633

Synchronize using `MTLSharedEvent`s #633

christiangnrd commented Jul 25, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading

codecov bot commented Aug 1, 2025 •

edited

Loading